Iterative refinement of lexicon and phrasal alignment

نویسندگان

  • Jae Dong Kim
  • Stephan Vogel
چکیده

In a data-driven machine translation system, the lexicon is a core component. Sometimes it is used directly in translation, and sometimes in building other resources, such as a phrase table. But up to now little attention has been paid to how the information contained in these resources can also used backwards to help build or improve the lexicon. The system we propose here alternates lexicon building and phrasal alignment. Evaluation on Arabic to English translation showed a statistically significant 1.5 BLEU point improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Expert Lexicon Approach to Identifying English Phrasal Verbs

Phrasal Verbs are an important feature of the English language. Properly identifying them provides the basis for an English parser to decode the related structures. Phrasal verbs have been a challenge to Natural Language Processing (NLP) because they sit at the borderline between lexicon and syntax. Traditional NLP frameworks that separate the lexicon module from the parser make it difficult to...

متن کامل

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

In this work, the use of a phrasal lexicon for statistical machine translation is proposed, and the relation between data acquisition costs and translation quality for different types and sizes of language resources has been analyzed. The language pairs are Spanish-English and Catalan-English, and the translation is performed in all directions. The phrasal lexicon is used to increase as well as...

متن کامل

Induction of Root and Pattern Lexicon for Unsupervised Morphological Analysis of Arabic

We propose an unsupervised approach to learning non-concatenative morphology, which we apply to induce a lexicon of Arabic roots and pattern templates. The approach is based on the idea that roots and patterns may be revealed through mutually recursive scoring based on hypothesized pattern and root frequencies. After a further iterative refinement stage, morphological analysis with the induced ...

متن کامل

Unsupervised Induction of a Syntax-Semantics Lexicon Using Iterative Refinement

We present a method for learning syntaxsemantics mappings for verbs from unannotated corpora. We learn linkings, i.e., mappings from the syntactic arguments and adjuncts of a verb to its semantic roles. By learning such linkings, we do not need to model individual semantic roles independently of one another, and we can exploit the relation between different mappings for the same verb, or betwee...

متن کامل

Tone and accent in Saramaccan: Charting a deep split in the phonology of a language

Saramaccan, an Atlantic creole spoken in Surinam, has traditionally been analyzed as exhibiting a high-tone/low-tone opposition in its lexicon. However, while it is true that part of its lexicon exhibits a robust high/low opposition, the majority of its words are marked not for tone but pitch accent. The Saramaccan lexicon, therefore, is split with some words being marked for tone and other wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007